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CO , Abstract 

Using Random Matrix Theory, we build a covariance matrix between stocks of the BM&F-Bovespa (Bolsa 
de Valores, Mercadorias e Futures de Sao Paulo) which is cleaned of some of the noise due to the complex 
p ^ interactions between the many stocks and the finiteness of available data, and use a regression model in order 

to remove the market effect due to the common movement of all stocks. These two procedures are then used 
in order to build portfolios of stocks based on Markovitz's theory, trying to build better predictions of future 
risk based on past data. This is done for years of both low and high volatility of the Brazilian market, from 



2004 to 2010. 

^ 1 Introduction 

in : 

i Modern portfolio theory is largely based on Markovit's ideas [Tj - [3] , where a portfolio of various equities is built 
, on the principle of minimizing risk given an expected return. Risk is assessed as the volatility of each stock 
• ' that comprises the portfolio, as well as their covariance. Preference is given to stocks that have negative or low 
■ covariance between each other, what leads to diversification of the equities held in one particular portfolio. 
Both volatility and covariance are integrated into the covariance matrix, which is built using the stock 
T 7"! | returns of past data. This is used in order to predict the risk of a portfolio, and it is usualy different from the 
m ^ ■ realized risk of the same portfolio. 

^ . Three problems arise from this approach. The first one is that past data reflect the market as it was, and 
^ | not as it will be. So, the theory assumes the hypothesis that future events shall mimic past events, what is 
usually not true, for it doesn't incorporate the release of news, or the current mood of the market. There is 
not much one can do about it, but in order to minimize effects of events that might change the behavior of a 
market, one cannot use past data that is too old. 

This leads us to the second problem, which is the noise associated with past data that arises purely from 
the fact that the available data is finite. Since one cannot go back in time indefinitely, and even if one could, it 
wouldn't be advisable given the discussion in the preceding paragraph, there is only a limited amount of data 
(in our case, price quotations) from which to build a covariance matrix. The problem gets even more severe 
if we think that an efficient portfolio should be built from many and diverse equities. A third source of noise 
comes from the complex interactions between the many elements of a stock market: traders, news, foreign 
markets, and the very prices of stocks interact in order to guide the price of a stock. Those interactions are 
usually too complex to be acommodated by any economic model. 

So, all this noise is incorporated into the covariance matrix that is used in order to try to forecast the risk 
of a particular portfolio, and if one can clean that matrix from some of that noise, one is then able to make 
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better predictions for risk. Many studies have been made on the influence of noise and other factors on the 
covariance matrix in the building of portfolios |4J-[9|. Most of the approaches for solving them involve the 
reduction of the dimensionality of the covariance matrix by introducing some structure into it, obtained by 
principal component analysis, separation of stocks into economic sectors, among other means (see [TO] and [11] 
for two of these approaches). 

A technique first developed for the study of the nuclei of the atoms of heavier elements, called Random 
Matrix Theory [12], compares the eigenvalues of a correlation matrix with those of a correlation matrix built 
from a purely random matrix. From such a comparison, one may then discern elements which are clearly 
not random, and study them separately. Such technique has been applied to a number of complex systems, 
and, particularly, to financial markets. Of the many results that were obtained, the building of portfolios that 
resemble more closely the realized risk of the future market, based on past data, is one of them [13] |14j . and 
it has been successfuly applied to stocks [TO] [TO] and mutual funds [17] . 

Besides the cleaning of the correlation matrix, we used a regression model in order to remove the market 
effect on the asset returns. This procedure makes it possible that the correlation matrix be estimated with 
greater precision, for there is just a part of the dependence which is due to the assets, what generates more 
reliable forecasts for the risk of a portfolio. 

The contribution of this article is the use of a method which is capable of ameliorating the risk forecasts of a 
portfolio built with assets of the Brazilian stock market, based on past data. This method involves three steps: 
(1) the removal of the market effect of the assets; (2) the cleaning of the correlation matrix, which encodes 
the structure of the dependence of the assets being considered, based on Random Matrix Theory, and (3) 
the construction of portfolios using Markowitz's theory and the cleaned correlation matrix. In this article, we 
calculate portfolios of stocks with and without the removal of the market effect so as to compare both results. 

In order to analyze the suitability of the proposed method, we shall use the daily returns of stocks of the 
BM&F-Bovespa with 100% liquidity in which pairs of years ranging from 2004 to 2010. For each year being 
analyzed, we build a portfolio using data from the previous year in order to make a forecast of the risk for the 
a determined year, and that forecasted risk is then compared with the realized risk in that year. As data used 
in this article include periods of both low and high volatility in the BM&F-Bovespa, in particular the data 
collected during the Subprime Mortgage Crisis of 2007 and 2008, we are able to study how this technique of 
cleaning the correlation matrix applies to times of high volatility. 

The article is organized in the following way: Section 2 introduces the basic concepts of Random Matrix 
Theory, which are then applied to the building of portfolios (according to Markowitz) with and without cleaning 
the correlation matrix for short selling allowed, as well as the regression for the removal of the market effect. 
Section 3 shows the results of portfolios forecasts using data from 2004 to 2010 and compares them to the 
realized risks. The article ends with a conclusion and comments on years of high volatility. 

2 Methodology 

In this section, we briefly describe the method proposed for the construction of portfolios by cleaning the 
correlation matrix and removing the market effect, aiming at a better forecasting of risk based on the previous 
behaviors of the assets. We use the year 2004 as an example of the application of such method in this section. 

2.1 Random matrix theory 

Random matrix theory had its origins in 1953, in the work of the German physicist Eugene Wigner [18] [19] . He 
was studying the energy levels of complex atomic nuclei, such as uranium, and had no means of calculating the 
distance between those levels. He then assumed that those distances between energy levels should be similar 
to the ones obtained from a random matrix which expressed the connections between the many energy levels. 
Surprisingly, he could then be able to make sensible predictions about how the energy levels related to one 
another. 

The theory was later developed, with many and surprising results arising. Today, random matrix theory is 
applied to quantum physics, nanotechnology, quantum gravity, the study of the structure of crystals, and may 
have applications in ecology, linguistics, and many other fields where a huge amount of apparently unrelated 
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information may be understood as being somehow connected. The theory has also been applied to finance 
in a series of works dealing with the correlation matrices of stock prices, and also with risk management in 
portfolios |20j-|23j (for a recent review on the subject, see |24j). 

The first result of the theory that we shall mention is that, given an L x N matrix with random numbers 
built on a Gaussian distribution with average zero and standard deviation a, then, in the limit L — > oo and 
N — > oo such that Q = L/N remains finite and greater than 1, the eigenvalues A of such a matrix will have the 
following probability density function, called a Marcenku-Pastur distribution [25J: 



P(A) 



Q V(A+-A)(A-A_) 

27TCJ 2 A 



where 



A_ 



and A is restricted to the interval [A_, A+]. 

Figure 1 shows some of these distributions for diverse values of Q and a. 



(1) 
(2) 



P(A) 





Since the distribution (pQ) is only valid for the limit L — > oo and N — >■ oo, finite distributions will present 
differences from this behavior. Another source of deviations is the fact that financial time series are better 
described by non-Gaussian distributions, such as t Student or Tsallis distribution. 

In Figure 2, we compare the theoretical distribution for Q = 10 and a = 1 to distributions of the eigenvalues 
of three correlation matrices generated from finite L x N matrices such that Q = L/M = 10, and the elements 
of the matrices are random numbers with mean zero and standard deviation 1. 
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Fig. 2. Histogram of eigenvalues for generated correlation matrix and Marcenku-Pastur theoretical distribution 
(solid line) Q = 10 and a = 1. 

Consequently, real data will deviate from the theoretic probability distribution. Nevertheless, the theoretical 
result may serve as a parameter to the results obtained experimentally. 

We shall now explain why random matrix theory is useful for portfolio building, starting by explaining how 
it can be used in order to clean the correlation matrix of some of its noise. In order to do that we shall consider 
the data concerning the year 2004, the first of the years considered here in our study. For this period we chose 
stocks of the Bovespa (by then not yet joined with the BM&F) which were negotiated every trading day during 
the years 2004 and 2005 (2004 will be the past data that will be used to predict the risk in 2005). Those stocks 
are listed in Appendix A, totalizing 61 stocks. 

For each stock, we calculated the returns, more precisely the log-returns, given by 

Pt ~ Pt-i 



Rt = hi(Pt)-HPt-i) 



(3) 



where Pt are the closing prices of one stock at the trading day t. Those returns were then normalized in order 
to obtain zero mean and standard deviation one by using the formula 

Rt - 



X, = 



OR 



(4) 



where /ir is the average of the time series used, and an is its standard deviation. This is done in order to best 
compare the resulting correlation matrix with the theoretical one, which is a random matrix with zero mean 
and standard deviation a chosen to be equal to one. The correlation matrix (a 61 x 61 matrix) between the 
variables Xt for the year 2004 was then calculated. 

The distribution density of the eigenvalues of the correlation matrix thus obtained is shown in Figure 3 (left 
picture). Also, the eigenvalues are plotted in order of magnitude in right picture of Figure 3. The shaded area 
indicates the region predicted by theory for the data related to a purely random behavior of the normalized 
returns. 
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Fig. 3. Left: histogram of eigenvalues for the correlation matrix of 62 stocks in 2004 and Marcenku-Pastur 
theoretical distribution (solid line). Right: eigenvalues for the correlation matrix of 62 stocks in 2004 and 
purely random region. 

We have L = 248 days of data for each of the N = 61 stocks, so that Q = 248/61 w 4, 06. The probability 
distribution function for a random matrix with L — > oo and N — > oo with Q pa 4, 06 is also plotted in Figure 3, 
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so that we may compare the result of pure noise with the one obtained for our data. The minimum (A_) and 
maximum (A + ) values of the probability distribution function are given by 



A_ = 0.254 and A+ = 2.238 . 

The first striking feature is that the largest eigenvalue is more than ten times larger than the maximum 
value predicted for a purely random correlation matrix. About 72% of the eigenvalues fall within the shaded 
region associated with pure noise, 15 of them fall bellow this region, and another one is above it. 

The eigenvectors e\ and ei for the two largest eigenvalues, Ai = 23.505, and A2 = 2.540, are represented 
in Figure 4 (first two graphics). The white bars represent positive values and the gray bars represent negative 
ones. 

The distribution of individual values of eigenvector e\ is very similar for all the stocks considered, showing 
that all stocks contribute to this mode, which is considered "the market mode". For eigenvalue e2, one can 
see the prevalence of some stocks over others. In comparison, eigenvectors corresponding to the shaded region 
(Wishart region) do not show any preference for particular stocks. 

The third and fourth graphs of Figure 4 show the eigenvectors distributions for the eigenvalues of two 
eigenvectors that are inside the region considered as noise, Ais = 0.853, and A37 = 0.393. Note that there are 
no structures of stocks clearly defined. 
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Fig. 4. Eigenvectors of some fixed eigenvalues: Ai, A2 (largest), Ais, A37 (noise region), and Ago, A6i (lowest 
eigenvalues) . 
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We also show the eigenvectors corresponding to the two lowest eigenvalues of the correlation matrix, Ago = 
0,046 and A62 = 0,039 (last two graphs in Figure 4). These eigenvectors corresponding to low eigenvalues 
represent "portfolios" of low risk, in opposition to the eigenvectors of the largest two eigenvetors, which represent 
the oscillations of the market and the common behavior of a cluster of stocks that behave in a similar way. 
Eigenvector e$i represents a portfolio where the investor buys PETR4 and short-sells PETR3, which are stocks 
belonging to the same company, Petrobras, and buys ELET3 and short-sells ELET6, which are belong also to 
the same company, Eletrobras, and eigenvector e^o represents a portfolio where the investor buys VALE3 and 
ELET6 and short-sells in VALE5 and ELET3, which again are two pairs of stocks of the same companies, and 
also buys PETR3 and short-sells PETR4. 

Figure 5 shows the daily log-returns of portfolio Pi built with eigenvalue e\ , plotted against the log-returns 
of the Ibovespa, which is an index that describes the general behavior of the Sao Paulo Stock Exchange. The 
correlation between the two vectors is 0.9865, which is a very strong indication that the portfolio P\ corresponds 
to a combination of stocks that behave much like the market, although with a much larger volatility: the 
standard deviation of the returns of Pi is 12.51%, an the standard deviation for the Ibovespa is 1.80%. 

The situation changes if we consider a portfolio built with eigenvector 637, which corresponds to the noisy 

part of the eigenvalue spectrum: the correlation between this portfolio, P37, and the Ibovespa is 0.1824, and 

it has a standard deviation 1.72%, very close to the standard deviation of the Ibovespa. For the portfolio P6i, 

built with eigenvector e%\, which corresponds to the lowest eigenvalue, the correlation with the Ibovespa is 

0.0932, and its standard deviation is 0.44%. This portfolio presents the lowest correlation with the Ibovespa. 

iTbovcspa 
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Fig. 5. Scatter plot of the portfolio built with e\ and Ibovespa returns. 



2.2 Building portfolios using Markowitz's Theory 

In this section, we shall build portfolios using the iV = 61 stocks we are considering based on the correlation 
matrix of their returns in the year 2004. According to the usual portfolio theory, we can obtain w, the vector 
of weights of the portfolio due to each stock, by fixing the portfolio return (RE) and minimizing the risk (RI) 
of the portfolio. 

The return of the portfolio is given by 

RE = w T R , (5) 

where R is the vector of average returns of each stock in 2004. 
The risk is defined by the variance of the portfolio 

RI = w T Z R w , (6) 

where is the estimated covariance matrix of the N stocks 



6 



The risk is then minimized with the constraint that the sum of all weights in the portfolio should be equal 
to one, 

N 

£> = 1. (7) 

i=i 

One can do that for several values of the average return, leaving the coordinates of w free to assume 
negative values, as well as positive ones, so that short selling is allowed. In Finance, this is not always possible, 
or sometimes it is limited, and so we shall consider the case of no short selling later on. 

In order to build a portfolio, the covariance matrix of a period of time (usually some months) prior to 
the period of investment is used together with a forecast of the expected returns. Those returns, which are 
unknown, may be approximated by many means, with relative degrees of success. There is a vast literature 
on the forecasting of returns [2] , but this does not concerns us in our study of how to improve the prediction 
of risk. So, in order to restrict ourselves to the analysis of the correlation matrix, we shall consider that our 
prediction of returns is the best possible one, which is a perfect forecast of returns. Of course, if we had a 
perfect forecast of returns, and we knew it was a perfect forecast, we wouldn't need to make any portfolio 
analysis. We use here the perfect forecast of returns in order to compare different ways of calculating risk in a 
way that is independent of the way one tries to forecast returns. 

So, we first use the covariance matrix from 2004, together with the average returns of 2005 (perfect forecast 
of returns), in order to build minimum risk portfolios for 2005. Doing so, we build an efficient frontier, which 
is a curve whose coordinates are the minimum risk for a given return. We also use the data form 2005, which 
means perfect forecasts of risk and return, in order to build an efficient frontier for the realized risk. 

In Figure 6, we have the predicted return and risk of portfolios using 2004 covariances of stocks (dashed line) 
and the realized return and risk using data from 2005 (solid line). Note that, for a given return, the predicted 
risk is smaller than the realized risk. This may lead to a false perception of how risky an investment truly is, 
and may cause wrong decisions by the manager of the portfolio. The agreement of the curves (predicted and 
realized risk) can be measured by 

-i n Ttjreal nrpred 

AG = ±Y RIi , (8) 

1=1 1L1 i 

where RIT eal is the realized risk and Ri^ red is the predicted risk, both for i = 1, • • • , n values of fixed returns. 
In our case, this number is AG = —0.176 (n = 100), which means that the predicted risk is, in average, 18% 
smaller than the realized risk. 
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Fig. 6. Realized (solid line) and predicted (dashed line) returns and risks of portfolios for 2005. 
2.3 Building portfolios with a cleaned correlation matrix 

The situation may be improved by trying to remove some of the noise of the correlation matrices of the returns 
of 2004 and 2005. One way this can be done is by building a diagonal matrix D where the elements of the 
diagonal are the eigenvalues of the original correlation matrix, but now with all eigenvalues corresponding 
to noise (those between A_ and A+) replaced by their average p3]-[T7]. In our present case, this average is 
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A = 0.748 for the eigenvalues based on data from 2004 and A = 0.790 for the eigenvalues based on data from 
2005. The cleaned correlation matrix is then built using the formula 

C clean = PDP- 1 , (9) 

where P are matrices whose columns are the eigenvectors of the original correlation matrix. 

Calculating now the efficient frontier built with the covariance matrix obtained from the cleaned correlation 
matrix of 2004, together with the average returns of 2005 (perfect forecast of returns), dashed line, and com- 
paring with the real curve calculated with the covariance matrix obtained from the cleaned correlation matrix 
of 2005, solid line, we obtain the results represented in Figure 7. 

The ratio between predicted and realized risk has now gone from AG = —0.176 to AG = —0.102, which 
means that the predicted risk is, in average, 10% smaller than the realized risk. This is an improvement on 
the previous result and shows how the cleaning of the correlation matrix may help building portfolios which 
account best for the realized risk based on previous data. 
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Fig. 7. Realized (solid line) and predicted (dashed line) returns and risks of portfolios for 2005 based on the 
cleaned correlation matrices. 

2.4 Systemic risk 

One of the stylized facts that remains, concerning the returns of assets, is the existence of volatility clustering. 
Generaly, those volatility clusterings occur due to patterns of the stock market, and they make it difficult to 
forecast the risk of portfolios, since they show a structure of dependence between the assets and the market, and 
not solely the dependence between assets. As an example, the prediction for 2008 using data from 2007 grossly 
underestimates the risk of 2008, for 2007 was a year with relatively low volatility and 2008 saw the height of 
the USA Subprime Mortgage Crisis. Similarly, risk prediction for 2009 using data from 2008 overestimates the 
risk for 2009. Figure 8a shows the volatility of the Ibovespa (the index for the Sao Paulo Stock Exchange) over 
the years considered in this article. The volatility is calculated as the absolute value of the log returns of the 
index. It clearly shows regions with high and low volatilities. 

The most common way to remove this so called systemic risk is to use a sinlge index model, where all log 
returns Rt are written in terms of a first degree function of a market index It, as, for example, the Ibovespa, 
plus a residue E t : 

R t = a + bI t + E t . (10) 

The coefficients a and b are determined for each equity using simple linear regression and the residue Et is 
defined by equation [TUJ. 

As an alternative to the use of the Ibovespa as the sinlge index, one may use the index obtained by the 
log returns of the portfolio of stocks that may be built using the eingenvector corresponding to the highest 
eigenvalue of the correlation matrix of those same stocks. As we have shown for the data concerning the year 
2004, both this index and the Ibovespa are very highly correlated, so the results shouldn't be substantially 
altered by using any of those two indices. 

Figures 8b and 8c show, respectively, the volatility of the log returns and of the residuals of PETR4, 
stocks from Petrobras, a gas and oil company that is one of the largest assets of the BM&F-Bovespa in terms 
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of negotiated volume. Note that the volatility of returns is less prone to effects due to market fluctuations, 
although those effects are still present in its variability. This remaining nonstationarity shall have its effects on 
the results in the next section. 
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Fig. 8. Volatility of Ibovespa (a), PETR4 (b) and the residuals of PETR4 (c) from 2004 to 2010. 



So, in what follows, we calculate the residuals for all stocks being considered for each pair of years using 
the Ibovespa as the single index. We then proceed to building portfolios using the correlation matrices between 
those residuals and also the cleaned correlation matrices. 



3 Results 

In this section, we shall describe the results obtained with the procedure presented in Section 2 for each of the 
years considered in this article. We shall make a comparison of the predicted risk and the realized risk for the 
years 2005 to 2010, using graphics and calculating AG. 

Table 1 presents the results for the measure AG calculated for this method with and without the cleaning 
of the correlation matrix and with and wihtout the regression for the removal of the market effect, as well as 
the volatility of the Ibovespa. 
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Without Cleaning 


With Cleaning 


Ibovespa 


Previous-Predicted 


No Regression 


Regression 


No Regression 


Regression 


Volatility 


2004 - 2005 


-0, 176 


-0,105 


-0.102 


0,021 


1,57% 


2005 - 2006 


-0.212 


-0,079 


-0,203 


0,064 


1,53% 


2006 - 2007 


-0.071 


0,018 


-0.056 


-0,101 


1,73% 


2007 - 2008 


-1.011 


-0,220 


-0.841 


-0,283 


3, 32% 


2008 - 2009 


0.259 


0,290 


0.169 


0,263 


1,93% 


2009 - 2010 


-0,056 


-0,218 


-0,116 


-0, 148 


1,28% 



Table 1 . Agreement measure of the curves (AG) and the volatility of the Ibovespa. 



One could notice that in Table 1 the majority of results for forecasts were better with the use of regression 
in order to eliminate the effect of the market movements and with the cleaning of the correlation matrix. 
Moreover, in every year in which there was a volatility drop from the previous year to the forecasted year 
(2005, 2006, 2009 and 2010), the cleaning of the correlation matrix showed better results, in other words, when 
volatility drops with respect to the previous year, it is best to use the cleaning of the correlation matrix, and 
when volatility grows with respect to the previous year, it is best not to use the cleaning of the correlation 
matrix. So, an investor may judge which would be the best method based on his anticipations in terms of the 
volatility of the market or use both procedures, with and without the cleaning the correlation matrix, in order 
to decide his or her strategy. 

Figure 9 contains the graphs of the return and realized risk (solid line), and predicted (dashed line) with the 
original correlation matrix (left plots) and the cleaned correlation matrix (right plots), always using regression 
in order to remove the market effect. Based on these graphics, one may notice that the cleaning procedure was 
best for years of low volatility of the Ibovespa. 

The year 2005 was relatively quiet for the Bovespa (by then, it hadn't yet merged with the BM&F). So, 
there was low volatility for 2005, and the predictions of risk are better using the residuals of data concerning 
this year. For this period, we chose stocks of the Bovespa which were negotiated every trading day during the 
years 2004 and 2005. Those stocks are listed in Appendix A, totalizing 61 stocks. For the correlation matrix 
of 2004 and the returns of 2005, we obtain AG = —0.105, which means that the predicted risk is 10% smaller 
than the realized risk. For the cleaned correlation matrices, AG = 0.021, which means that the predicted risk 
is 2% larger than the realized risk. 

The year of 2006 was also a low volatility year for the Bovespa. For this period, we chose stocks of the 
Bovespa which were negotiated every trading day during the years 2005 and 2006. Those stocks are listed 
in Appendix A, totalizing 72 stocks. The predicted risk is 8% smaller than the realized risk for the original 
correlation matrix, and for the cleaned correlation matrix, the predicted risk is 6% larger than the realized risk. 

Although 2006 was a year of low volatility, 2007 saw the beginning of the USA Subprime Mortage Crisis, 
so there is a strong difference in volatility between those two years even when removing the market effect. For 
this period we chose stocks of the Bovespa which were negotiated every trading day during the years 2006 and 
2007, listed in Appendix A, totalizing 86 stocks. For the original correlation matrix the predicted risk is 2% 
larger than the realized risk and for the cleaned correlation matrix, the predicted risk is 10% smaller than the 
realized risk. 

The year 2008 was the most turbulent year in Brazil and in stock exchanges worldwide since the Black 
Monday crisis of 1987. The USA Subprime Mortgage Crisis spread to become a crisis of trust in financial 
institutions and to a worldwide credit crisis. For this period, we chose stocks of the Bovespa (BM&F-Bovespa 
from 2008 onwards) which were negotiated every trading day during the years 2007 and 2008, also listed in 
Appendix A, totalizing 105 stocks. For the original correlation matrix, the predicted risk is 22% smaller than 
the realized risk, and for the cleaned correlation matrix, the predicted risk is 28% smaller than the realized 
risk. From all procedures, this was the year that resulted in the worst forecasts. 

Volatility remained high for 2009, although in Brazil the international economic crisis did not have the same 
strength as in other countries. So, the BM&F-Bovespa had a quieter period than some other stock exchanges, 
although the movement in and out of foreign investment remained high. For this period, we chose stocks of the 
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Bovespa which were negotiated every trading day during the years 2008 and 2009. Those stocks are listed in 
Appendix A, totalizing 148 stocks. For the original correlation matrix the predicted risk is 29% larger than the 
realized risk, and for the cleaned correlation matrices the predicted risk is 26% larger than the realized risk. 

Volatility remained higher than normal for the year 2010, and apprehension due to a succession of inter- 
national financial crises made the market unstable, but less than in the previous years. For this period we 
chose stocks of the Bovespa which were negotiated every trading day during the years 2009 and 2010, listed 
in Appendix A, totalizing 153 stocks. For the original correlation matrix the predicted risk is 22% larger than 
the realized risk, and for the cleaned correlation matrix the predicted risk is 15% larger than the realized risk. 

After the crisis of 2008, the forecasts became poorer when compared with the ones of the previous years, 
indicating that the highest the change in volatility in subsequent years, the worst are the forecasts obtained 
for portfolios. 
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(c) 2007 
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Fig. 9. Realized (solid lines) and predicted (dashed lines) returns and risks of portfolios using the original 
correlation matrices (left plots) and the cleaned correlation matrices (right plots) form 2005 to 2010. 



As mentioned before, short selling is not usually freely allowed in financial transactions, mainly due to the 
increase in risks it might bring to a portfolio. So, we also did the calculations for no short selling, which implies 
in the addition of a constraint that specifies that all values in the weight vector w must be positive or zero. 
The results are summarized in Table 2, and they show that the conclusions to be derived from the case with 
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no short selling allowed are basically the same as the ones from the case where short selling is allowed: there 
is usually a significant better result when removing the market effect trhough a regression and, in years of a 
small variation of volatility, the cleaning of the correlation matrices is a better method for forecasting risk, and 
it fails for the forecasting of risk for years of high volatility 

Without Cleaning With Cleaning Ibovespa 



Previous-Predicted 


No Regression 


Regression 


No Regression 


Regression 


Volatility 


2004 - 2005 


0.057 


-0,066 


0.069 


0,001 


1,57% 


2005 - 2006 


-0.128 


0.041 


-0, 104 


0,074 


1,53% 


2006 - 2007 


-0.263 


0,133 


-0.261 


-0,178 


1,73% 


2007 - 2008 


-0.462 


-0,564 


-0.467 


-0,606 


3, 32% 


2008 - 2009 


0.506 


0,322 


0.514 


0,337 


1,93% 


2009 - 2010 


-0,149 


0,207 


-0,136 


0,176 


1,28% 



Table 2. Agreement measure of the curves (AG) and the volatility of the Ibovespa without the use of short 
selling. 

4 Conclusion 

In this article, we used two techniques in order to clean the covariance matrix used in the building of portfolios 
using Markovitz's theory. The first technique is the use of Random Matrix Theory in order to clean the 
correlation matrix built from the time series data of stocks in the year prior to the year for which the portfolio 
is to be built. The second technique is to use a regression model in the removal of the market effect due to 
the common movement of all stocks. These are used in order to forecast the risk of a portfolio in a particular 
year using data from its previous year with better precision. The data were the time series returns of the 100% 
liquid assets of the BM&F-Bovespa covering the years from 2004 to 2010. The aim was to combine these two 
methods in different configurations, and to compare these results in order to obtain the best risk forecasts for 
portfolios. 

Based on a measure of the aggreement (AG) between the forecasted and the realized risks, we conclude that 
the forecasted risk is closer to the realized risk, depending on the volatility of the forecasted year being smaller 
or larger than the one of the year used for the forecast. In the case there is a drop in volatility from one year to 
the other, the results of the aggreement measure were best when we built a portfolio using the cleaning of the 
correlation matrices. In the case there is a raise in volatility, it is best to use the original correlation matrix. 
This result is valid when the market effect is both removed and maintained by a regression. 

The better performance of the method of cleaning the correlation matrices for the building of portfolios 
occurs because it eliminates the volatility (noise due to the complex interactions between the many stocks) 
which is useless for the forecasting of risk for a subsequent year of lower volatility. We noticed the advantage, 
in slightly more than half the cases, in making the forecast of risk having the market effect removed, but 
without the presence of a clear pattern. Last, in the year of the greatest crisis (2008), the use of regression was 
particularly better in terms of previsibility. 

So, the use of regression methods in the removal of market effects is usually advisable, but the use of 
Random Matrix Theory in the removal of noise from the correlation matrices tend to fail in the forecasting for 
years of high volatility, which are precisely the occasions in which a reliable risk forecast is most needed. 
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A Stocks of the BM&F-Bovespa used for the portfolio building 

2004-2005 



AMBVA BBASS BBDCS BBDCA BRAPA 

CLSC6 CMIGS CMIGA CNFBA CPLES 

EBTPA ELETS ELET6 EMBRS FESAA 

ITSAA ITUBA KLBNA LAMEA LIGTS, 

RAPTA SBSPS SUZBh TBLES TCSLS 

TNCPA TNLPS TNLPA TRPLA UNIP6 

WEGES 



BRKMh BRTOA CCROS CESPh CGASh 

CPLE6 CRUZS CSNAS CTNMA EBTP'i 

FFTLA FIBRS GGBRA GOAUA INEPA 

MAGGS PCARh PETRS PETRA POMOA 

TCSLA TLPPS TLPPA TMAR5 TNCP3 

USIMh VALE3 VALEh VIVOS VIVOA 



2005-2006 



AMBVA 


BBAS3 


BBDC3 


BBDCA 


BRAPA 


CLSC6 


CMIG3 


CMIGA 


CNFBA 


CPFE3 


EBTP'i 


EBTPA 


ELET3 


ELET6 


EMAEA 


GGBRA 


GOAUA 


GOLLA 


GRND3 


IDNT3 


MAGG3 


MGELA 


NATU3 


PCAR5 


PETR3 


SUZB5 


TBLE3 


TCSL3 


TCSLA 


TELB3 


TNCPA 


TNLP3 


TNLPA 


TRPLA 


UGPAA 


VIVOA 


WEGE3 









BRKMh BRTOA CCR03 CESPh CGAS5 

CPLE6 CRUZ3 CSNA3 CTNMA DAS A3 

EMBR3 FFTLA FIBR3 FJTAA GGBR3 

ITSAA ITUBA KLBNA LAMEA LIGT3 

PETRA POMOA PSSA3 RAPTA SBSP3 

TELBA TLPP3 TLPPA TMARh TNCP3 

UNIP6 USIM5 VALE3 VALE5 VIVOS 



2006-2007 



ALLL3 


AMBV3 


AMBVA 


BBASS 


BBDCS 


BTOW3 


CCROS 


CGAS5 


CLSC6 


CMIGS 


CSAN3 


CSNAS 


CTAXA 


CTNMA 


GYRES 


ENBR3 


ENMA'SB 


ETERS 


FFTLA 


FIBRS 


GRND3 


GUARS 


IDNTS 


ITSAA 


ITUBA 


MGELA 


NATUS 


NETCA 


OHLBS 


PCAR5 


RENTS 


RSIDS 


SBSPS 


SLEDA 


SUZB5 


TELBA 


TLPPS 


TLPPA 


TMAR5 


TNLPS 


USIM5 


VALES 


VALE5 


VIVOS 


VIVOA 



BBDCA BOBRA BRAPA BRKMh BRTOA 

CMIGA CNFBA CPFES CPLE6 CRUZS 

DASAS ELETS ELET6 EMAEA EMBRS 

FJTAA GGBRS GGBRA GOAUA GOLLA 

KLBNA LAMEA LIGTS LRENS MAGGS 

PETRS PETRA POMOA PSSAS RAPTA 

TAMMA TBLES TCSLS TCSLA TELBS 

TNLPA TRPLA UGPAA UNIP6 UOLLA 
WEGES 



2007-2008 



ALLLS 


AMBVS 


AMBVA 


BBASS 


BBDCS 


BRKMh 


BRTOA 


BTOWS 


CARDS 


CCROS 


CNFBA 


COCEh 


CPFES 


CPLE6 


CRUZS 


DASAS 


ECODS 


ELETS 


ELET6 


ELPLA 


FFTLA 


FIBRS 


FJTAA 


GETIS 


GETIA 


GRNDS 


IDNTS 


IMBIA 


ITSAA 


ITUBA 


LRENS 


LUPAS 


MAGGS 


MDIAS 


MLFTA 


PETRS 


PETRA 


POMOA 


POSIS 


PSSAS 


SLEDA 


SUZBh 


TAMMA 


TBLES 


TCSLS 


TMARh 


TNLPS 


TNLPA 


TOTSS 


TRPLA 


VALES 


VALEh 


VIVOA 


VLIDS 


WEGES 



BBDCA BEESS BISAS BRAPA BRFSS 

CESP6 CGASh CLSC6 CMIGS CMIGA 

CSANS CSMGS CSNAS CTNMA CYRES 

EMBRS ENBRS EQTLS ESTRA ETERS 

GFSAS GGBRS GGBRA GOAUA GOLLA 

JBDUA KLBNA LAMEA LIGTS LPSBS 

NATUS NETCA ODPVS OHLBS PCARh 

RAPTA RENTS RNARS RSIDS SBSPS 

TCSLA TELBS TELBA TLPPS TLPPA 

UGPAA UNIP6 UOLLA USIMS USIMh 
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2008-2009 



ABCBA ALLL3 AMAR'i AMBV3 AMBVA 

BEEF3 BEES3 BEMA3 BICBA BISM 

BRML3 BRSR6 BRTOA BTOW'S BVMF3 

CLSC6 CMIG3 CMIGA CNFB4 COCE5 

CSNA3 CYRE3 CZRSA DAS A3 DAYCA 

ENBR3 EQTL3 ESTRA ETER3 EVEN3 

GETIA GFSA3 GGBR3 GGBRA GOAUA 

INPR3 ITSAA ITUB3 ITUBA JBDUA 

LIGT3 LOGN3 LPSB3 LREN3 LUPA3 

MRVE3 MULT3 NATU3 NETCA ODPV3 

PINEA PLAS3 PMAM3 POMOA POSI3 

RENT3 ROMI3 RSID3 SBSP3 SFSAA 

TAMMA TBLE3 TCSA3 TCSL3 TCSLA 

TMAR5 TNLP3 TNLPA TOTS3 TOYB3 

USIM3 USIM5 VALE3 VALE5 VIVOA 



AMIL3 


BAZA3 


BBAS3 


BBDC3 


BBDCA 


BMTOA 


BPNMA 


BRAPA 


BRFS3 


BRKM5 


CARD3 


CCIM3 


CCR03 


CESP6 


CGASh 


CPFE3 


CPLE6 


CRUZ3 


CSAN3 


CSMG3 


ECOD3 


ELET3 


ELET6 


ELPLA 


EMBR3 


EZTC3 


FFTLA 


FHER3 


FJTAA 


GETI3 


GOLLA 


GPIVU 


GRND3 


IDNT3 


IGTA3 


JBSS3 


JHSF3 


KEPL3 


KLBNA 


LAMEA 


MAGG3 


MILK11 


MLFTA 


MMXM3 


MRFG3 


OHLB3 


PCAR5 


PDGR3 


PETR3 


PETRA 


PRVI3 


PSSA3 


RAPTA 


RDCD3 


RDNI3 


SLCE3 


SLEDA 


SMT03 


SULA11 


SUZB5 


TELB3 


TELBA 


TGMA3 


TLPP3 


TLPPA 


TOYBA 


TRPLA 


UGPAA 


UNIP6 


UOLLA 


VLID3 


WEGE3 


WSON11 







2009-2010 



ABCBA 


ALLL3 


AMAR3 


AMBV3 


AMBVA 


BEMA3 


BICBA 


BISA3 


BPNMA 


BRAPA 


BTOW3 


BVMF3 


CARD3 


CCIM3 


CCR03 


COCEh 


CPFE3 


CPLE6 


CRDE3 


CRUZ3 


ECOD3 


ELET3 


ELET6 


ELPLA 


EMBR3 


EZTC3 


FESAA 


FFTLA 


FHER3 


FJTAA 


GOAUA 


GOLLA 


GPIVU 


GRND3 


GSHP3 


INPR3 


ITSAA 


ITUB3 


ITUBA 


JBDUA 


LAME3 


LAMEA 


LIGT3 


LLXL3 


LOGN3 


MLFTA 


MMXM3 


MNDLA 


MPXE3 


MRFG3 


ODPV3 


OGXP3 


OHLB3 


PCAR5 


PDGR3 


POMOA 


POST3 


PRVI3 


PSSA3 


RAPTA 


SLCE3 


SLEDA 


SMT03 


SULAU 


SUZB5 


TELB3 


TELBA 


TGMA3 


TLPP3 


TLPPA 


TOYBA 


TPIS3 


TRPLA 


UGPAA 


UNIP6 


VIVOA 


VLID3 


WEGE3 







AMIL3 BBAS3 BBDC3 BBDCA BEEF3 

BRFS3 BRKM5 BRML3 BRSR6 BRTOA 

CESP6 CLSC6 CMIG3 CMIGA CNFBA 

CSAN3 CSMG3 CSNA3 CYRE3 DASA3 

ENBR3 EQTL3 ESTRA ETER3 EVEN3 

GETI3 GETIA GFSA3 GGBR3 GGBRA 

HBOR3 HYPE3 IDNT3 IGTA3 INEPA 

JBSS3 JHSF3 KEPL3 KLBNA KROTU 

LPSB3 LREN3 LUPA3 MAGG3 MILKU 

MRVE3 MULT3 MYPK3 NATU3 NETCA 

PETR3 PETRA PINEA PLAS3 PMAM3 

RDCD3 RENT3 RSID3 SBSP3 SFSAA 

TAMMA TBLE3 TCSA3 TCSL3 TCSLA 

TMARh TNLP3 TNLPA TOTS3 TOYB3 

UOLLA USIM3 USIM5 VALE3 VALE5 
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